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Abstract 

The validity of statistical analyses applied to identify 
different factors in many fields depends upon the use of 
appropriate sample sizes, the lack of which reduces the 
power of the findings. However, the number of cases 
collected for data analysis in medical studies is generally 
limited, for medical, financial and other experimental 
reasons, and statistical tests are often carried out without 
power and sample size estimation. Power analysis involves 
several parameters, the most important of which, the effect 
size, reflects the degree of the effect expected to be found in 
the study. An easy-to-use MS Excel calculator has been 
constructed to determine the effect size for chi-square tests 
based on 2x2, 2x3 and 2x4 contingency tables, and compared 
the results obtained with this calculator with those given by 
GPower, R and, for a 2x2 table, SAS software to demonstrate 
the practical use of this calculation tool in three studies 
involving various data. 
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Introduction 

The independence of two quantitative grouping 
variables can be investigated by creating a contingency 
table [Pearson, 1904]. p-values for the comparison of 
these categorical variables can be calculated using chi- 
square tests, the statistical method most frequently 
used to detect differences between proportions 
[Agresti, 2002], supported by most statistical software. 

Power calculation has recently received increasing 
attention in statistical analysis as an essential tool to 
determine an appropriate sample size, or to obtain a 


power which indicates the reliability of the statistical 
test based on preliminary data [Cohen, 1988; Gordon 
et al., 2002]. Power is considered highly important in 
experimental design, where it is applied before data 
collection, but it is itself also based on preliminary 
information (which can be obtained from a pilot study) 
[Osmena, 2010]. 

Most statistical tests have their specific calculation 
methods for power estimation; in this paper, we focus 
on power calculations for the chi-square test, which 
are well known [Cohen, 1988]. To determine the 
power of a statistical test, a few input parameters have 
to be set, the most crucial of them being the effect size. 
The determination of effect size is a medical problem: 
which is generally given by a physician or researcher 
on the basis of earlier experience. When the examined 
field is a new one and there are no previous results to 
find an appropriate effect size, it can be determined 
from new data, but only with the use of a preliminary 
set. 

The power of a statistical test is the probability of 
correctly finding a difference (rejecting the null 
hypothesis) between the investigated variables with 
the statistical test. 

There are several arguments that a post-hoc power 
calculation is meaningless [Hoenig et al., 2001; Lenth, 
2001]. During research in which power is calculated, it 
is mostly done after the experiment has been finished, 
i.e. retrospectively, which is as much a mistake as it is 
useful in the design phase. Hoenig et al. showed why 
there is no reason to run a power analysis when a 
result was significant [Hoenig et al., 2001]. Researchers 
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have to handle this topic with foresight. 

Power analysis is available in several free software 
packages (e.g. R or GPower). A comparison of power 
analysis for the chi-square test in the R and SAS 
systems revealed both advantages and disadvantages 
of the possibilities in the methods applied by the two 
programs [Osmena, 2010]. However, these packages 
utilize effect size as an input parameter. If it is not 
given by an expert, but a preliminary data set is 
available, calculation of the effect size is possible. 

To simplify effect size calculation from a contingency 
table, an MS Excel calculator has been constructed 
(Appendix II), and compared with calculations in R (for 
a 2x2 contingency table, an R script is given in 
Appendix III), in GPower software, and for 2x2 
contingency tables in SAS. 

Other effect size calculators for the chi-square test are 
freely available via the internet (including MS Excel- 
based ones) [DeFife, 2009; Ellis, 2009; Wilson, 2001], 
but these mostly work with the formula based on the 
chi-square test statistic and the total number of cases, 
rather than frequency data in a contingency table, 
which we applied [Cohen, 1988]. These calculators are 
mainly created for 2x2 contingency tables, but we have 
extended the dimensions to 2x3 and 2x4 tables. 

Aim 

Our primary aim was to evaluate the effect size for 
power estimation for chi-square tests as a 
demonstration of the analysis in different studies 
using an MS Excel-based calculator based on pilot 
studies. A further goal was to run power calculations 
on additional experimental data, using the calculated 
effect size. The easy-to-use MS Excel calculator was 
compared with other software (R, GPower and SAS; 
see the section Programs Used in the Methods) in order 
to prove our results. 


Methods 


Definitions 

The null hypothesis of the chi-square test for 
independence can be tested by the following formula: 



Oi = observed frequency 

Ei = expected frequency, asserted by the null 


hypothesis 

m = number of cells [Osmena, 2010]. 

" The distribution of the ^-statistic follows a central 
chi-square distribution when the null hypothesis is 
true. When the null hypothesis is false it follows a 
noncentral chi-square distribution with the 
noncentrality parameter. A", where 

A=(effect size) Oi 2 (total sample size) [Osmena, 2010]. 

Therefore, the power can be defined as the 
probability P(x 2 W, A) > Xi-aW)) [Osmena, 2010]. 

As the chi-square test is used to check the 
independence between investigated categorical 
variables, two-sided tests are carried out during 
analyses. 

The function pwr.chisq.test() in the pwr library of R 
uses the following parameters to perform power 
analysis: 

- Effect size 

- Total number of observations 

- Degrees of freedom 

- Level of significance 

- Power 

Effect size (w): This is the size of the effect that is 
expected to be found in the study [Osmena, 2010]. 
From another aspect it is a pure number which 
increases with the degree of discrepancy between the 
distribution of the alternate hypothesis and the null 
hypothesis (viz. the difference from the null 
hypothesis that we expected to detect) [Cohen, 1988], 
or in other words, the difference between the 
investigated populations which considered from the 
background of the specific (i.e. clinical) field. From the 
genetics point of view, the w is a measure of the 
separation of individual phenotypes based on their 
genotypes [Gordon et al., 2005]. 

In the R program, the ES.wl() and ES.w2() functions of 
the pwr package can be used to calculate w [Osmena, 
2010 ]. 

Total number of observations (N): This is the number 
of all nonmissing cases in a study that compares two 
grouping variables (and thus the overall number of 
cases in the investigated contingency table). 

Degrees of freedom ( df ): This is the number of rows in 
the contingency table minus 1 multiplied by the 
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number of columns in the contingency table minus 1 
(viz. the number of cells that need to be known in the 
contingency table in order to calculate the others the 
row and column totals given). 

Level of significance (a): This is also called the Type I 
error, which reflects the probability of a significant 
statistical test when there is no real difference between 
the compared populations (viz. the probability that the 
null hypothesis is true, but it is rejected: our result is 
false-positive). 

Power (1-/3): "The power of a statistical test is the 
probability of correctly rejecting the null hypothesis 
when it is false" [Osmena, 2010]. /3 is the probability of 
failing to obtain a significant difference between the 
investigated populations with the statistical test when 
there is a real difference in the background. It is also 
known as the Type II error rate, or the probability of a 
false-negative result when the null hypothesis is false, 
but it is failed to be rejected. In each statistical test, a 
lower Type II error is expected to acquire as well as a 
higher power (as /3 approaches 0, 1-/3 approaches 1). 

Any four of these parameters determine the fifth. 
Hence, when we are interested in estimation of the 
value of power, we set the values of w, N, df and a for 
our study. 

Relationships between these parameters: 

When w is large the 1-/3 is also large in the event of a 
fixed N, a and p. By contrast, if there is a small w, 1-/3 
will be low and more samples are needed to detect the 
small w. 


As a increases, /3 decreases, and power increases and 
vice versa. As a increases, N decreases, and as a 
decreases, N increases [Osmena, 2010]. 


To estimate the effect size for the chi-square test, an 
MS Excel-based calculator has been constructed for 
2x2, 2x3 and 2x4 contingency tables, applying Cohen's 
formula 


where: 


w = 


m 


1 

7-1 


(Pa- P(L > 2 

Poi 


Pn :"the proportion in cell i posited by the alternate 
hypothesis" (observed proportion of element i in the 
cross-classification table). 


Poi : "the proportion in cell i posited by the null 
hypothesis" (expected proportion of element i in the 


cross-classification table). 

m: the number of cells in the contingency table [Cohen, 
1988; Osmena, 2010]. 

Our MS Excel calculator was compared with the R and 
GPower programs. For the 2x2 contingency tables 
used in two studies, the power estimation was also 
compared with SAS, and an R script was written 
(Appendix III). 

Without calculating the effect sizes, Cohen's 
suggestion of small, medium and large w (0.1, 0.3, 0.5, 
respectively) can also be used, depending on how 
large a difference expected to find in the study [Cohen, 
1988; Osmena, 2010]. 

The following sections present short descriptions of 
the three studies. 

Study 1: Alzheimer's Disease Data 

Alzheimer's disease (AD) is a neurodegenerative 
disorder, the most prevalent form of dementia [Selkoe, 
2001], in the development of which genetic factors 
play an important role. Identification of AD 
susceptibility genes would markedly promote an 
understanding of the pathophysiology of the disease 
[Bekris et al., 2010]. 

An important candidate gene for AD is 24- 
dehydrocholesterol reductase (DHCR24), the 
polymorphism of which (rs600491) involves a single 
nucleotide change resulting in 2 alleles (C and T) and 3 
genotypes (CC, CT and TT) [Lamsa et al., 2007]. 

Lamsa et al. investigated the DHCR24 rs600491 
polymorphism as regards the risk of AD in a Finnish 
sample, and found that the CC, CT and TT genotype 
frequencies did not differ significantly between the 
AD and healthy control (HC) female groups 
(NHCfemale=274, Na D female=289; P(CC I HC)=0.11, 
P(CC I AD)=0.11; P(CT I HC)=0.43, P(CT I AD)=0.48; 
P(TT I HC)=0.46, P(TT I AD)=0.41; Fisher's Exact test 
p=0.44) [Lamsa et al., 2007]. A Hungarian examination 
likewise revealed no significant difference in DHCR24 
rs600491 genotype distribution between the AD and 
HC female populations (NHCfemaie=139, NADfemaie=201; 
P(CCIHC)=0.137, P(CC I AD)=0.189; P(CT I HC)=0.525, 
P(CT I AD)=0.508; P(TT I HC)=0.338, P(TT I AD)=0.303; 
X 2 test p=0.426) [Feher et al., 2012]. 

The effect size and power in both studies of females 
have been calculated, from which the results are 
compared. 
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Study 2: Physiotherapeutic Data 

During an informatics project between September 2009 
and May 2011, a system was developed by Calculus 
Ltd. and Polygon Informatics Ltd. to facilitate 
physiotherapeutic examinations on the human body 
with different wireless biosensors and with the use of 
other parameters collected on patients. This 
Physiosensor system connected to an SAS server, 
where the data collected and saved during the 
examinations can be analyzed via an online user 
interface. The first set of investigations in the project is 
related to knee joint straightening. During the test, 3 
EMG sensors with an earthing strap (measuring the 
muscle activity), a goniometer (measuring the 
flexibility of a joint) and an event marker (determining 
time points) were used on specific muscles and part of 
the leg. The patient, in a sitting position, had to 
straighten their knee according to a specific protocol. 
Before the measurement, personal and other health 
data were collected. Our analysis of the w and 1-/3 
estimation used only gender in comparison with a 
grouping variable based on the birth date of the 
patient. This comparison by a chi-square test resulted 
in a nonsignificant p-value, and therefore the effect 
size and statistical power could be investigated. 

Study 3: Anthropometric Data 

Anthropometric parameters of first-year university 
students (Faculty of Medicine, University of Szeged, 
Hungary; aged around 18-19), including the hip and 
waist circumferences, were measured three times in 
2010. Groups of averaged values of these parameters 
were compared with the count data in the chi-square 
test, and w and 1-/3 were then estimated. 

Programs Used 

GPower 3.1.5 (Heinrich-Heine-Universitat, Diisseldorf, 
Germany), RStudio 0.97.248 (RStudio Inc., Boston, MA, 
USA) as a user friendly interface for running R (R 
Foundation for Statistical Computing, University of 
Auckland, New Zealand) scripts, SAS 9.2 (SAS 
Institute Inc., Cary, NC, USA) and IBM SPSS Statistics 
20 (IBM Corp., Armonk, NY, USA) were used for 
statistical analyses. The MS Excel-based calculator was 
generated in Microsoft Office 2010 (Microsoft Hungary 
Ltd., Budapest, Hungary). 

Results 

Study 1: Alzheimer's Disease Data 

The Finnish dataset was based on a total of 563 female 
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patients (289 AD and 274 HC subjects; Table A1 in 
Appendix I), while the Hungarian DHCR24 rs600491 
genotype frequency data contained overall 340 female 
patients (201 AD and 139 HC subjects; Table A2 in 
Appendix I). 

With the ES.w2() R function w, and then, with the 
pwr.chisq.test() function, the observed power for the 
above two datasets were calculated. The Finnish 
dataset resulted in a z^Finnish=0.054 with a 1-/3=0.1918 
among 563 cases at a 0.05 significance level. To achieve 
a power of 0.8 (80%), 3299 cases should be examined if 
the other parameters remain constant. 

The Hungarian dataset resulted in iCHun g arian=0.071 with 
a 1-/3=0.198 among 340 cases at a 0.05 significance level. 
To achieve a power of 0.8, 1922 cases should be 
examined if the other parameters remain constant. 

Using w from the pilot (Finnish) dataset in the power 
analysis of the Hungarian dataset, a power of 0.132 
was obtained, which is slightly less than that 
calculated with w based on its own data. However all 
the data indicate a really low level of power. 

It is obvious that with a larger w, a smaller N is 
sufficient to obtain the same level of power, when a 
and df are fixed. Here, it was expected the least to 
emphasize this difference in calculation. As there is no 
available clinically relevant w, we can use that 
calculated from the preliminary Finnish dataset (this is 
what people mostly do), as if it were the 'True" effect 
size, but it is not. As w from the Finnish dataset is 
accessible, there is no need to calculate the w of the 
Hungarian dataset. From this example, it can be seen 
that w can be different in similar studies, so the 
clinically relevant w value would be appropriate. Thus, 
it is highly important to handle post-hoc calculations 
with foresight. Our comparisons therefore were based 
on calculation. 

Figure 1 depicts the observed power (on the vertical 
axis) based on the Finnish data (empty dots) compared 
with the Hungarian data (filled dots) versus the 
number of cases (on the horizontal axis). The same 
Cohen formula for w and 1-/3 estimation for the chi- 
square tests [Cohen, 1988] as used in R is applied in 
the commonly used GPower software [Erdfelder et al., 
1996]. Our MS Excel-based calculator gave the same 
effect size of chi-square analysis as GPower for the 
Hungarian data (. Figure 2), but it is difficult to use 
GPower for the estimation of w. Figure Al in Appendix 
I shows the R code with the result of the effect size and 
power analysis compared with Figure 2, using the 
same colors (magenta indicates effect size and orange 
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indicates the power in this paper). 

As SAS calculates power only for 2^2 contingency 


ww w. sr 1-j ournal . or g 

tables, a power analysis for the Alzheimer's disease 
study failed to be performed. 
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FIG.l POWER ESTIMATION (VERTICAL AXIS) FOR FINNISH AND HUNGARIAN FEMALE GENOTYPE FREQUENCY DATA WITH 
INCREASING TOTAL NUMBER OF FEMALES (HORIZONTAL AXIS). a=0.05, df=\, whungarian=0.071, zotinnish=0.054. A GREATER POWER 
CAN BE ACHIEVED WITH THE HUNGARIAN THAN WITH THE FINNISH DATA BASED ON THE SAME NUMBER OF PATIENTS. 

FIGURE WAS CREATED IN R. 



FIG.2 COMPARISON OF OUR RESULTS WITH THE MS EXCEL CALCULATOR (LEFT-HAND PANEL) WITH THE GPOWER SOFTWARE 
RESULTS. IN THE EXCEL CALCULATOR, ONLY THE RED-FRAMED CONTINGENCY TABLE NEEDS TO BE COMPLETED; THE OTHER 
NUMBERS ARE CALCULATED AUTOMATICALLY. IN GPOWER w CAN BE CALCULATED BY USING THE PROBABILITIES FOR THE HO 
(BLUE-FRAMED NUMBERS) AND HI (GREEN-FRAMED NUMBERS) HYPOTHESES. THE RESULTING w (FROM THE RIGHT-HAND 
PANEL OF THE GPOWER SOFTWARE) CAN BE TRANSFERRED TO THE MAIN WINDOW (LEFT-HAND PANEL OF THE GPOWER 
SOFTWARE, IN THE CENTRAL PANEL OF THE FIGURE), AND POWER (ORANGE-FRAMED) CAN THAN BE DETERMINED. IT IS SEEN 
THE SAME ROUNDED EFFECT SIZE VALUES ARE CALCULATED WITH EXCEL AND GPOWER (MAGENTA-FRAMED). 
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Study 2: Physiotherapeutic Data 

The physiotherapeutic dataset was collected from the 
database of the Physiosensor system. As an increasing 
number of institutes now use the system for various 
medical studies (e.g. the University of Debrecen, 
Debrecen, Hungary, for neurological patients, or 
hospital in Csepel, Hungary), we had to select those 
cases performed in the Physiosensor laboratory (N= 42). 
Among the 42 observations, 34 involved the knee joint 
straightening protocol between February 2 and April 
19 in 2011. To create a grouping variable, we 
differentiated the cases with respect to the median 
value of the birth date: the group of those born before 
and including 1986, and the group born in 1987 or 
later. 

As a demonstration for power analysis, we selected 
the data originating from before April 1, 2011 as a pilot 
dataset, and then ran a chi-square test on the 
remaining 29 cases. The contingency table for the 
investigated physiotherapeutic dataset can be found in 
Table A3 in Appendix I. 

The Pearson chi-square test resulted in a two-sided 
asymptotic p-value of 0.573 (the 2-sided Fisher's Exact 
test gave p=0.715), showing no significant difference 
between the gender and birth date groups based on 
this dataset at a 0.05 significance level. 

The data selection, variable transformation and chi- 
square tests were carried out in IBM SPSS Statistics 20. 

ic=0.105 was calculated with R, and with this value for 
N= 29 patients and df= 1 (a 2x2 contingency table) at 
n=0.05, 1-/ML0872 was obtained. 

For the overall period February- April 2011, the 
contingency table is shown in Table A4 in Appendix I. 
The Pearson chi-square test resulted in a two-sided 
asymptotic p- value of 0.746 (the 2-sided Fisher's Exact 
test gave p=1.000). 

This extended study resulted in a zc=0.056 (almost half 
of that based on the preliminary data), which at n=0.05 
with df= 1 leads to a power of 0.0621 for 34 
observations. When the power for the extended 
dataset (N=34) was calculated based on the effect size 
of the preliminary dataset with 29 cases, a power of 
0.0937 was found. 

All power estimations based on physiotherapeutic 
datasets involving very low N resulted in almost no 
power for testing independence statistically from 
gender and birth date count data. 

Figure 3 illustrates the comparison of 1-/1 estimations of 
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preliminary and extended datasets with increasing N 
using the separately calculated w values for the pilot 
study and the extended study. It is clear that, in the 
event of a very low N for power and sample size 
estimations, even a small increase in N can cause a 
huge difference in power estimation. 



FIG. 3 COMPARISON OF POWER ESTIMATION (VERTICAL AXIS) 
OF PRELIMINARY (N= 29) AND EXTENDED (N=34) 
PHYSIOTHERAPEUTIC DATASETS WITH INCREASING 
NUMBERS OF PATIENTS (HORIZONTAL AXIS), w IS 
CALCULATED SEPARATELY FOR THE TWO STUDIES. a=0.05, 
df= 1, wpreliminary=0.105 / wextended=0 . 056 . FIGURE WAS 
GENERATED IN R. 

Figure 4 presents a comparison of results obtained 
with the MS Excel-based calculator and GPower 
results. The effect size and power calculation results in 
R was compared. The script is presented in Figure A2 
in Appendix I. 

The 1-/1 in SAS was also computed, using the w 
obtained with the preliminary data for estimation of 
the power for the extended dataset. The SAS syntax 
for this analysis can be found in Figure A3 in Appendix 
I. Figure 5 illustrates the result of the SAS syntax based 
on Figure A3: at N= 34 SAS calculated 1-/1=0.092. The 
same calculation in R resulted in 1-^=0.0937 ( Figure A4 
in Appendix 7). 


The SAS System 

Computed Power 


3ft 

1050 

0922 

77* POWER Procvdu re 

lncfejt 

N Total 

Power 


31 

1100 

0933 

Pearson Cftt-square Test Tor Two Proportions 

1 

* 

OftSl 


32 

1150 

0913 

Ftoed Scenario Elements 


2 

ft 

DOftft 


33 

1300 

0951 

DlsENbUlSQ n noimnl 


3 

1ft 

0061 


34 

1250 

0953 

Methfld Nftmni flppttmmiiBQft 


4 

14 

006ft 


35 

1300 

09W 

Null Proportion Difference & 


5 

IS 

0071 


36 

1350 

0 970 

Alpha 005 


6 

22 

007ft 


37 

1400 

0974 

Croup 1 Proportion 0«7 


7 

2ft 

0061 


3fl 

1450 

og?a 

Group 2 Proportion 0 571 


a 

3ft 

ooer 


39 

ISM 

09fn 

Number of SWes 2 


9 

34 

0092 


4ft 

1550 

09B4 

Group i Weigel t 


1ft 

Eft 

0.113 


41 

1600 

0957 

Group 2 Weigh! \ 


11 

TOO 

0179 


42 

1650 

09B.9 



12 

l» 

024ft 


43 

1700 

0990 


FIG. 5 THE RESULT OF POWER ANALYSIS IN THE SAS SYSTEM 
FOR GROUP PROPORTIONS OF 0.467 AND 0.571 BASED ON 
PRELIMINARY PHYSIOTHERAPEUTIC DATA WITH A 
COMPUTED POWER FOR DIFFERENT NUMBER OF CASES. 
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FIG. 4 COMPARISONS OF OUR RESULTS WITH THE MS EXCEL CALCULATOR (LEFT-HAND PANEL) WITH THE GPOWER SOFTWARE 
RESULTS FOR PRELIMINARY PHYSIOTHERAPEUTIC DATA. IN THE EXCEL CALCULATOR, ONLY THE RED-FRAMED CONTINGENCY 
TABLE NEEDS TO BE COMPLETED; THE OTHER NUMBERS ARE CALCULATED AUTOMATICALLY. IN GPOWER w CAN BE 
CALCULATED BY USING THE PROBABILITIES FOR THE HO (BLUE-FRAMED NUMBERS) AND HI (GREEN-FRAMED NUMBERS) 
HYPOTHESES. THE RESULTING w CAN BE TRANSFERRED (FROM THE RIGHT-HAND PANEL OF THE GPOWER SOFTWARE) TO THE 
MAIN WINDOW (LEFT-HAND PANEL OF THE GPOWER SOFTWARE, IN THE CENTRAL PANEL OF THE FIGURE), AND POWER 
(ORANGE-FRAMED) CAN THAN BE DETERMINED. IT IS SEEN THAT THE SAME ROUNDED EFFECT SIZE VALUES ARE CALCULATED 

(MAGENTA-FRAMED). 


Study 3: Anthropometric Data 

In the anthropometric dataset from 2010 (N= 362, 181 
men and 181 women) to demonstrate the w and 1-jS 
calculation for the chi-square test, mean waist and 
mean hip circumference variables (measured in cm 3 
times for each person) were divided into 2 categories 
at their median values (90 cm for the hips, and 79 cm 
for the waist). As the contingency table ( Table A5 in 
Appendix 7) has no cells containing an expected value 
less than 5, which meets the assumption of chi-square 
tests and can be interpreted. 

Appendix I also contains results of Pearson's chi-square 
tests, showing a nonsignificant difference between 
these two categorical variables at a 5% significance 
level (p=0.076) in Table A6. 

Figure 6 depicts a bar chart of the frequencies 
(contingency table) which can be a good graphical 
representation of such count data. It is seen that when 
the waist circumference is less than its median (79 cm), 
slight more samples also have a lower hip 
circumference, and when the waist circumference is 
greater than its median, more samples also have a 
greater hip circumference (compared with 


medianhi P =90), a result that is probably to be expected. 



FIG. 6 BAR CHART OF FREQUENCY PARTITION AMONG 
WAIST AND HIP CIRCUMFERENCE CATEGORIES OF THE 
ANTHROPOMETRIC DATASET FROM 2010. COLUMNS 
REFLECT NUMBERS OF 104, 80, 84 AND 94 RESPECTIVELY (SEE 
TABLE A5 IN APPENDIX I). 

The initial dataset was in a CSV (Comma Separated 
Values) format, which is usually used as the input of 
statistical programs. Data were then imported into 
IBM SPSS Statistics 20 software, and variable 
computation, data preparation and first comparisons 
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were carried out on this software. Next to investigate 
the w and l-j8 estimation for the chi-square test, our 
MS Excel-based calculator was run, and the results 
were compared with those of GPower, R software and 
(as we have a 2^2 contingency table) SAS. 

The MS Excel-based calculator and GPower gave 
ic=0.093, which resulted in 1-/1=0.427 in GPower ( Figure 
7). These results were also obtained with a 
representative dot plot ( Figure 8) in R according to the 
script shown in Appendix III. 

Plots related to the calculated power of an exact w, df, 
N and a can also be generated with GPower software. 
A comparison of power for significance levels n=0.01 
and n=0.05 is presented with increasing total sample 
size for anthropometric data in Figure 9. This makes it 
clearer that, with stronger significance levels in the 
power analysis (lower a ), more samples must be 
analyzed to obtain a stronger power (closer to 1), if the 
other parameters remain constant. 

Naturally Figures 8 and 9 for n=0.05 indicate the same 
result. It is expected that the results from the two 
software can be compared. 
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Conclusions 

When a clinically relevant effect size as the 'True" 
estimate of the effect in our study is unknown, effect 
size can be calculated from the preliminary dataset 
(this can also differ from the true effect size). Without 
previous results, w is generally calculated on the basis 
of the running study, as if that w were the "truth". 
Sofwares were employed to operate the same 
calculation, but the interpretation of these results 
should be carefully considered. 

In conclusion, as the usage of GPower to perform 
power analyses may be difficult for those without 
expertise in the field of statistics, an easy-to-use MS 
Excel calculator based on 2x2, 2x3 and 2x4 contingency 
tables in addition to other software is suggested. An R 
script for the 2x2 contingency table has been generated 
and such calculations have been applied to different 
practical studies. With such techniques, it is not 
difficult to calculate the effect size and power for chi- 
square tests and these calculations may feasibly be 
integrated into routine statistical analyses for 
researchers with biological or other backgrounds. 


A | B C D E 

1 Effect size calculation of parameters for 2*2 groups by Cohen: 

2 Preliminary study 

Fill only red framed part of this sheet to get 
effect size of your frequency data in green cell! 


5 Cross classification table 

6 


Number in each cell 
of contingency table 

Grouping 

variable 2 


JJ 

KK 

Grouping 
variable 1 

A 

104 

80 

184 

B 

84 

94 

178 


Sum 

188 

174 

362 



Probability of each cell with preliminary observed values 

P(H1) 

1 Grouping variable 2 


1 .1.1 

KK 

Grouping 
variable 1 

A 

1 0,2872928 

0,2209945 1 0.51 

B I 

1 0.2320442 

0.2596685 1 0.49 


Sum | 0,52 

' flJ8 T F00 


Probability of no association between the 2 investigated factors 
(Expected proportions of grouping variable 1 and 2) 

P(HO) 

Grouping variable 2 


JJ 

KK 

Grouping 
variable 1 

A 1 

0.2639724 

0,2443149 

0,51 

B 1 

1 0.2553646 

0.2363481 

0.49 


Sum 

1 0,52 

0.48 

1 1,00 




Effect size (w) 


Ai = (Pli - P0i)*2 / POi 

Grouping 

variable 2 


JJ 

KK 


Grouping 

A 

0,002060221 

0,002225986 

variable 1 

B 

0.002129667 

0.00230102 





w = \'(IAi) = 

































ES_2X2 ES_2x3 , ES_2x4 


Central and noncentral distributions [ Protocol of power analyses 






° ( 

) 

1 2 3 

4 

5 

6 


Testfamily 

Statistical test 





x * 1 2 tests V 

Goodness-of -fit tests: Contingency tables 



v 


Type of power analysis 

Post hoc: Compute achieved power - given a, sample size, and effect size 



v] 


Input Parameters ^ 

| Determine => | Effect size w 1 

0.0933643 

a err prob 

0.05 

Total sample size 

362 

Df 

1 


Output Parameters 

Noncentrality parameter X 
Critical x 2 
Power (1-|1 err prob) 


X-Y plot for a range of values 


Number of cells 



| Normalize p(HO) ] [ Normalize p(Hl) 

| Auto calc last cell ] [ Auto calc last cell ] 


[ Calculate and transfer to main window ] 


::: 


FIG. 7 COMPARISON OF THE RESULTS OF EFFECT SIZE CALCULATION (MAGENTA) WITH THE MS EXCEL-BASED CALCULATOR 
(LEFT-HAND PANEL) AND GPOWER SOFTWARE (RIGHT-HAND PANEL) BASED ON THE CONTINGENCY TABLE (SET IN THE RED- 
FRAMED TABLE IN EXCEL, UPPER LEFT) OF THE ANTHROPOMETRIC DATA. PROBABILITIES CALCULATED FOR OBSERVED AND 
EXPECTED VALUES IN MS EXCEL CALCULATOR ARE FRAMED IN GREEN AND BLUE, RESPECTIVELY, AND ALSO SHOWN FOR 
GPOWER (RIGHT PANEL OF GPOWER). THE OBSERVED POWER ESTIMATION IN GPOWER SOFTWARE IS FRAMED IN ORANGE. 
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Power estimation with increasing number of cases 
for anthropometric frequency dataset 



0 500 1000 1500 2000 

Number of students 


FIG. 8 DOT PLOT COMPARING POWER ESTIMATION 
(VERTICAL AXIS) WITH INCREASING NUMBER OF CASES 
(HORIZONTAL AXIS) FOR ANTHROPOMETRIC DATA, 
DEMONSTRATING THAT TO ACHIEVE A POWER OF ABOUT 80% 
ALMOST 1000 CASES SHOULD BE ANALYZED UNDER THE 
SAME CONDITION (df= 1, a=0.05, ^=0.093). 



FIG. 9 COMPARISON OF POWER ESTIMATION (VERTICAL AXIS) 
FOR a=0.01 (RED LINE) AND a= 0.05 (BLUE LINE) WITH 
INCREASING TOTAL SAMPLE SIZE (HORIZONTAL AXIS) FOR 
ANTHROPOMETRIC DATA (dfrl, w= 0.0933643). THIS PLOT WAS 
GENERATED BY GPOWER. 
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APPENDIX I 

Study 1: Alzheimer's Disease Data 

Table A1 is a contingency table for Finnish Alzheimer's 
disease (AD) vs. healthy control (HC) cases (columns) 
by DHCR24 rs600491 genotypes (rows). In each cell, 
the 1 st row relates to the number of females, the 2 nd 
row to the column total percentages and the 3 rd row to 
the table total percentages. This table was created in R 
software. 

TABLE A1 CONTINGENCY TABLE FOR FINNISH AD VS. HC 
CASES BY DHCR24 RS600491 GENOTYPES. 1STROW: NUMBER OF 
FEMALE CASES, 2ND ROW: COLUMN TOTAL PERCENTAGES, 
3RD ROW: TABLE TOTAL PERCENTAGES IN EACH CELL. 


| Alzheimer 



Genotypes | 

HC | 

AD | ROW Total 

l 

1 

cc | 

1 

30 | 

32 | 

62 

1 

0.109 | 

0.111 | 


1 

1 

0.053 | 

0.057 | 


1 

CT | 

118 | 

139 | 

257 

I 

0.431 | 

0.481 | 


I 

1 

0.210 | 

0.247 [ 


1 

TT | 

126 | 

118 | 

244 

I 

0.460 | 

0.408 | 


I 

| 

0.224 | 

1 

0.210 | 

1 


1 

Column Total | 

1 

274 | 

1 

289 | 

563 

1 

0.487 | 

0. 513 | 



Table A2 is the contingency table for Hungarian 
Alzheimer's disease vs. healthy control cases (columns) 
by DHCR24 rs600491 genotypes (rows). In each cell, 
the 1 st row relates to the number of females, the 2 nd 
row to the column total percentages and the 3 rd row to 
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the table total percentages. This table was created in R 
software. 

TABLE A2 CONTINGENCY TABLE FOR HUNGARIAN AD VS. HC 
CASES BY DHCR24 RS600491 GENOTYPES. 1STROW: NUMBER OF 
FEMALE CASES, 2ND ROW: COLUMN TOTAL PERCENTAGES, 
3RD ROW: TABLE TOTAL PERCENTAGES IN EACH CELL. 


| Alzheimer 




■Genotypes 1 
1 

HC | 

| 

AD 

| ROW 

- 1 

Total 

1 

cc [ 

19 | 

38 

1 

1 

57 

I 

0.137 | 

0.189 

1 


1 

1 

0.056 | 

0.112 

1 

- 1 


CT I 

73 | 

102 

1 

1 

175 

1 

0.525 | 

0. 507 

1 


I 

| 

0.215 | 

1 

0. 300 

I 

- 1 


1 

TT I 

47 | 

61 

1 

1 

108 

1 

0.338 | 

0. 303 

1 


1 

| 

0.138 | 

| 

0.179 

1 

- 1 


1 

column Total j 

139 | 

201 

1 

1 

340 

I 

0.409 j 

0. 591 

1 



The calculation of w and l-j8 for the chi-square test for 
the Hungarian Alzheimer dataset in R software is 
shown in Figure Al. The same contingency table is to 
be seen here in freq2 object as is shown in Table A2. In 
prob3 object, we calculated the probabilities for the 
overall 340 cases. R calculated the same w and 1-jS as 
generated by GPower software (colored the same in 
the outputs of the two software: magenta indicates 
ic=0.071, and orange indicates l-p= 0.198; see Figure 2). 


> freq2 

Alzheimer 

Genotypes HC ad 
cc 19 38 

CT 73 102 
TT 47 61 

> prob3<-freq2/340 

> prob3 

Alzheimer 

Genotypes HC AD 

CC 0.05588235 0.1117647 
CT 0.21470588 0.3000000 
TT 0.13823529 0.1794118 

> ES.w2(prob3) 

[1] 0.07080762 

> pwr. chisq. test(w=ES. w2(prob3) , N=340, df=2, sig. level=0. 05 , power=) 

chi squared power calculation 
w =T0. 070807621 

N ^ 

df = 2 
power 

note: N is the number of observations 


FIG. Al EFFECT SIZE (MAGENTA-FRAMED) AND POWER 
(ORANGE-FRAMED) CALCULATION FOR HUNGARIAN 
ALZHEIMER'S DISEASE DATASET IN R SOFTWARE. 

Study 2: Physiotherapeutic Data 

Table A3 is the contingency table for the 
physiotherapeutic data comparing the birth date 
grouping with the gender. Overall 29 cases were 
investigated between February 2 and April 1, 2011. 
This table is generated in IBM SPSS Statistics software. 
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TABLE A3 CONTINGENCY TABLE FOR THE COMPARISON OF 
GENDER AND BIRTH DATE GROUPS FOR THE 
PHYSIOTHERAPEUTIC DATASET MEASURED BETWEEN 
FEBRUARY 2 AND APRIL 1 IN 2011. 


Birth date * Gender Crosstabulation 



Gender 

Total 

Male 

Female 

Birthdate After 1 986 Count 

Expected Count 
% within Gender 

7 

7,8 

46,7% 

8 

7,2 

57,1% 

15 

15,0 

51,7% 

Before 1 987 Count 

Expected Count 
% within Gender 

8 

7,2 

53,3% 

6 

6,8 

42,9% 

14 

14,0 

48,3% 

Total Count 

Expected Count 
% within Gender 

15 

15,0 

100,0% 

14 

14,0 

100,0% 

29 

29,0 

100,0% 


The contingency table for the physiotherapeutic data, 
comparing the birth date grouping with gender, is 
shown in Table A4. Overall, 34 cases were investigated 
between February 2 and April 19, 2011. This table was 
generated in IBM SPSS Statistics software. 

TABLE A4 CONTINGENCY TABLE FOR THE COMPARISON OF 
GENDER AND BIRTH DATE GROUPS FOR THE EXTENDED 
PHYSIOTHERAPEUTIC DATASET MEASURED BETWEEN 
FEBRUARY 2 AND APRIL 19 IN 2011. 



Birth date * Gender Crosstabulation 



Gender 

Total 

Male 

Female 

Birth date After 1 936 Count 

Expected Count 
% within Gender 

3 

3,5 

50,0% 

10 

9,5 

55,6% 

18 

18,0 

52,9% 

Before 1 937 Count 

Expected Count 
% within Gender 

3 

7,5 

50,0% 

3 

3,5 

44,4% 

16 

16,0 

47,1% 

Total Count 

Expected Count 
% within Gender 

16 

16,0 

100,0% 

18 

18,0 

100,0% 

34 

34,0 

100,0% 
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Calculations of w and 1-jS for the chi-square test for the 
preliminary Physiosensor frequency dataset in R 
software are shown in Figure A2. The structure is 
similar to that in Figure Al. We used freq_pl object in 
this short R script for the contingency table, which is 
to be seen in Table A3, and prob_pl object to calculate 
the probabilities for the overall 29 cases. R calculated 
the same w and 1-/3 as generated by GPower software 
(colored the same in the outputs of the two softwares: 
magenta indicates ip= 0.105, and orange indicates 1-jS 
=0.087; Figure 4). 

> freq_pl 

Gender 

Birthdate Male Female 
After 1986 7 8 

Before 1987 8 6 

> prob_pl 

Gender 

Birthdate Male Female 

After 1986 0.2413793 0.2758621 

Before 1987 0.2758621 0.2068966 

> ES. w2(prob_pl) 

[1] 0.1047619 

> pwr . chisq. test(w=ES. w2(prob_pl) , N=N_pl, df=l, sig. level=0. 05, power=) 

Chi squared power calculation 

w = r0- 10476191 
N = 
df = 1 
sig. level 

power 087186133 
NOTE: N is the number of observations 


FIG. A2 EFFECT SIZE (MAGENTA-FRAMED) AND POWER 
(ORANGE-FRAMED) CALCULATION FOR PRELIMINARY 
PHYSIOTHERAPEUTIC DATASET (N= 29) IN R SOFTWARE. 

The SAS code for power analysis with different 
sample sizes based on the group proportions of the 
2x2 contingency table of the preliminary Physiosensor 
dataset for the extended dataset (N=34) is to be seen in 
Figure A3. 


/*' Power analysis for 2 by 2 contingency table */ 

/* contingency table for physiosensor preliminary dataset: 
7 3 
3 6 


*/ 


proc power; 

twosair.plefreq test=pchi 
alpha=0 . 05 

groupproportions = (0.467 0.571} 

nul Ip r op o r t i ondi .f f =0 

ntotai=2 to 34 by 4 50 to 1700 by 50 

power=. 


/* Pearson Chi-square te3t */ 

/* significance level */ 

/* proportions for columns: 7/ (7+3} 3/ (3+6) */ 

/* testing the nullhypothesis : there is no difference in proportions *7 
/* checking power estimation for different sample sizes */ 

/* power estimation: 1-beta *7 


plot x=n; /*' horizontal axis represents the number of cases, vertical axis shows the power */ 

run; 


FIG. A3 SAS CODE FOR POWER ANALYSIS FOR A 2x2 CONTINGENCY TABLE BASED ON THE PRELIMINARY PHYSIOTHERAPEUTIC 
DATASET. THE POWER IS CALCULATED ON THE BASIS OF THE PROPORTIONS IN THE PRELIMINARY DATASET, BUT RUN FOR 

NUMBER OF CASES (N=34) FROM THE EXTENDED DATASET. 
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The power calculation in R, using effect size based on 
the preliminary physiotherapeutic data for the 
extended (N= 34) dataset, is presented in Figure A4. The 
analysis results in 1-/H3.093 7. 

> pwr.chisq.test(w=ES.w2(prob_pl), N=N_p2, df=l, si g. level =0.05, power =) 

chi squared power calculation 

w = 0.1047619 
N = 34 
df = 1 

sig. level = 0.05 

power = 0^09372487 

note: N is the number of observations 

FIG. A4 POWER ANALYSIS BASED ON ^=0.105 FROM THE 
PRELIMINARY DATA FOR THE EXTENDED (N=34) DATASET. 
1-^8=0.093 7, CALCULATION WAS PERFORMED IN R. 

Study 3: Anthropometric Data 

The contingency table of the anthropometric dataset 
from 2010, comparing waist and hip circumference 
groups, is shown in Table A5 (total number of cases 
362). Each cell contains the number of observed cases, 
the expected value and the row percentage of each 
pair of comparisons. This contingency table was 
created with IBM SPSS Statistics 20. 

TABLE A5 CONTINGENCY TABLE FOR THE 
ANTHROPOMETRIC DATASET COMPARING THE WAIST AND 
HIP CIRCUMFERENCE CATEGORIES (N= 362). 


Category of Waist ' Category of Hip Crosstabulation 





Category of Hip 





Greater than 
or equal to 

Lower than 





Median 

Median 

Total 

Category of Waist 

Greaterthanorequalto 

Count 

104 

80 

184 


Median 

Expected Count 

95,6 

88,4 

184,0 



% within Category of 
Waist 

56,5% 

43,5% 

100,0% 


Lower than Median 

Count 

84 

94 

178 



Expected Count 

92,4 

85,6 

178,0 



% within Category of 
Waist 

47,2% 

52,8% 

100,0% 

Total 


Count 

188 

174 

362 



Expected Count 

188,0 

174,0 

362,0 



% within Category of 
Waist 

51,9% 

48,1% 

100,0% 


Table A6 results of the chi-square tests on the 
anthropometric dataset are from 2010. Pearson's chi- 
square test shows an asymptotic p-value of 0.076. As 
none of the cells in the contingency table contains an 
expected value less than 5, this chi-square test can be 
interpreted. This table was made with IBM SPSS 
Statistics 20. 

TABLE A6 RESULTS OF CHI-SQUARE TESTS IN SPSS BASED ON 
THE CONTINGENCY TABLE SHOWN IN TABLE A5. 


Chi-Square Tests 



Value 

df 

Asymp. Sig. 
(2-sided) 

Exact Sig. (2- 
sided) 

Exact Sig. (1- 
sided) 

Pearson Chi-Square 

3,1 56 a 

1 

,076 



Continuity Correction b 

2,793 

1 

,095 



Likelihood Ratio 

3,160 

1 

,075 



Fisher's Exact Test 




,092 

,047 

N ofValid Cases 

362 






a. 0 cells (0,0%) have expected count less than 5. The minimum expected count is 85,56. 

b. Computed only for a 2x2 table 
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APPENDIX II 

An MS Excel calculator for Effect Size for chi-square 
power analysis for 2^2, 2x3 and 2x4 contingency tables 
is available online (http://www3.szote.u- 
szeged.hu/dmi/Anna_Laszlo/Effect- 
size_calculator_for_Chi-square_power.xlsx). 

APPENDIX III 

As a third appendix, an R script with explanatory 
comments for effect size and power estimation of the 
anthropometric data is available in Figure A5. The code 
has 5 main parts (indicated by ### at the beginning of 
the comment): 

1. Creating the contingency table (freq_al 2x2 
matrix) by waist and hip circumference groups. 
Basically, the user only has to set frequencies 
for this contingency table, using rlcl_al, 
rlc2_al, r2cl_al, r2c2_al as the number of each 
cell (counts), where r and c are denoted as row 
and column, and the numbers indicate, which 
row and column we are in. For example, r2cl is 
the cell at the intersection of the second row 
and the first column in the contingency table. 
The notation "_al" denotes that these count 
numbers are from a preliminary study. 

2. Running the chi-square test and comparing the 
results with those from IBM SPSS Statistics 
(Appendix I; Tables of Study 3) to check the 
calculations of this software. Contingency table 
is compared by using the CrossTable() function 
from the gmodels package. The chi-square test 
assumption is checked by calculating the 
expected values from the contingency table. 
Fisher's Exact test and the chi-square test with 
the Yates continuity correction are compared 
between the two softwares. They give the same 
result. 

3. For the effect size calculation for the chi-square 
test, probabilities have to be calculated 
(prob_al 2x2 matrix) and the pwr package has 
to be imported to allow use of the ES.w2() 
function for effect size estimation. 

4. Power is estimated with the pwr.chisq.test() 
function from the pwr package. To run this 
estimation, effect size (from the previous 
calculation by the ES.w2() function), degrees of 
freedom for the contingency table ((number of 
rows - l)*(number of columns - 1)), number of 
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cases (N_al is the sum of the elements in the 
contingency table) and significance level (a set 
as 0.05) have to be set. If power is set instead of 
number of cases, then it can be estimated that 
how many cases would present a determined 
value of power (e.g. 0.8, 0.9 and 0.95), 
supposing the same effect size, significance 
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level and degrees of freedom. 

5. A dot plot was drawn representing the power 
with increasing number of cases for the same 
study parameters (zv= 0.09336431, a=0.05 df= 1 as 
we have a 2^2 contingency table). This is 
shown in Figure 8. 
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#2013-01-09 

#Effect size and power estimation for chi-square test 
#Anthropometric data 


###2?2 contingency table for pilot study: dataset 2010 
rlcl_al=94 #el ement of first row and first column 

rlc2_al=84 #el ement of first row and second column 

r2cl_al=80 #el ement of second row and first column 

r2c2_al=104 #el ement of second row and second column 


(al: anthropometric dataset 1) 


f req_al<-matri x(c(rlcl_al , rlc2_al , 
r 2cl_al , r 2c2_al) , 
nrow=2 p byr ow=true , 

dimnames = 1 i st(wai stci rcumf erence = c( ,r < Median", "Median <="), 
Hi pci rcumf erence = c("< Median", "Median <=")) 

) 


f req_al 

# 

#wai stci rcumf erence 

# < Median 

# Median <= 


Hi pci rcumf erence 
< Median Median <= 
94 84 

80 104 


###checking results in SPSS: 

i nstall . packages("gmodel s") #install packages if needed 

library(gmodels) #after installing gmodels package 

crossTabl e(f req_al , prop. c= alee , prop. chi sq=FALSE) #row %s are ok 

f i sher . test(f req_al) #Fisher's Exact Test for count Data p=G. 09219, the same as in SPSS 

chi sq. test (freq_al) Sexpected #the same as in SPSS 

chi sq. test (freq_all #X-squared = 2.7928, df = 1, p-value = 0.09469 

#the same as in SPSS's continuity correction chi-square test 


###Effect size calculation 
#number of cases 

N_al=rlcl_al+rlc2_al+r2cl_al+r2c2_al #362 - number of cases (overall frequency) 

#probability of observed values in the contingency table (added up to 1; each is nij/N in cell ij): 

prob_al <- freq_al/N_al 

prob_al 


# 

#wai stci r cumf er ence 

# < Median 

# Median <= 

1 i brary(pwr) 
es. w2(prob_al) 


Hi pci rcumf erence 
< Median Median <= 

0.2596685 0.2320442 

0.2209945 0.2872928 

#w=0. 09336431 


###Power and sample size estimation 
#power estimation 

pwr . chi sq. test (w=ES. w2(prob_al) , df=(2-l)*(2-l) , 
#sample size estimation 

pwr . chi sq. test(w=ES. w2(prob_al) , df=(2-l)*(2-l) , 
pwr . chi sq. test (w=ES. w2(prob_al) , df =(2-1) *(2-1) , 
pwr . chi sq. test (w=ES. w2(prob_al) , df =(2-1)* (2-1) , 
pwr . chi sq . test (w=ES . w2 (prob_al) , df=(2-l)*(2-l) , 
pwr . chi sq . test (w=ES . w2 (prob_al) , df=(2-l)*(2-l) , 


N=N_al, si g. 1 evel =. 05 , power =) #power=0. 4272621 


N= , 

si g. 1 evel=. 05 , power =0. 7 )Sn 

N= , 

si g. 1 evel =. 05 , 

power =0. 8)$N 

N= , 

si g. 1 evel =. 05 , 

power=0. 9 )Sn 

N= , 

si g. 1 evel=. 05 , 

power=0. 95) $N 

N= , 

si g. 1 evel=. 05 , 

power=0. 99) $N 


#709 

#901 

#1206 

#1491 

#2108 


###P 1 ot of power compared with| increasing number of cases 
N_all<-as. numeric () #number of cases 
P_al<- as . numer i c () #p owe r 

for (i in 1:19) { 

N_all [i ]=pwr . chi sq. test (w=ES. w2(prob_al) , N=, df=l, si g. 1 evel=0. 05 , power =(0. 01+i *0. 05)) $N 
P_al [i ] =0 . 01+i *0.05 

> 

pi ot(N_all , P_al , 

mai n="Power estimation with increasing number of cases\nfor anthropometric frequency dataset", 

xlab=" Number of students", 

ylab=" Power estimation", 

type="p" , 

col ="dark red" , 

xlim=c (0,2200) , 

yl im=c(0,l) 


FIG. A5 R SCRIPT FOR EFFECT SIZE AND POWER CALCULATION FOR THE ANTHROPOMETRIC DATASET 
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